ci: Add automatic flaky test detector by nicohrubec · Pull Request #18684 · getsentry/sentry-javascript

nicohrubec · 2026-01-03T13:23:31Z

Manually checking for flakes and opening issues is a bit annoying. I was thinking we could add a ci workflow to automate this. The action only runs when merging to develop. Could also be done on PRs but seems unnecessarily complicated. My thinking is that for a push to develop to happen, all the test must first have passed in the original PR. Therefore if the test then fails on develop we know it's a flake. Open for ideas/improvements/cleanups or let me know if there might be any cases I am missing that could lead to false positives.

Example issue created with this: #18693

It doesn't get all the details but I think basically the most important is a link to the run so we can then investigate further. Also the logic for creating the issues is a bit ugly, but not sure if we can make it cleaner given that I want to create one issue per failed test not dump it all into one issue.

github-actions · 2026-01-03T15:38:15Z

node-overhead report 🧳

Note: This is a synthetic benchmark with a minimal express app and does not necessarily reflect the real-world performance impact in an application.

Scenario	Requests/s	% of Baseline	Prev. Requests/s	Change %
GET Baseline	8,694	-	9,102	-4%
GET With Sentry	1,738	20%	1,651	+5%
GET With Sentry (error only)	6,010	69%	5,891	+2%
POST Baseline	1,204	-	1,157	+4%
POST With Sentry	595	49%	561	+6%
POST With Sentry (error only)	1,062	88%	1,016	+5%
MYSQL Baseline	3,250	-	3,176	+2%
MYSQL With Sentry	451	14%	367	+23%
MYSQL With Sentry (error only)	2,633	81%	2,593	+2%

View base workflow run

.github/workflows/build.yml

dev-packages/e2e-tests/test-applications/node-express/tests/errors.test.ts

cursor · 2026-01-05T13:02:50Z

.github/workflows/build.yml

+              for (const [key, value] of Object.entries(vars)) {
+                const pattern = new RegExp(`\\{\\{\\s*env\\.${key}\\s*\\}\\}`, 'g');
+                title = title.replace(pattern, value);
+                issueBody = issueBody.replace(pattern, value);


String replace special patterns corrupt job name substitution

The replace() calls interpret special dollar-sign sequences in the replacement string ($&, $', $$, $1, etc.) rather than treating them literally. If a job name happens to contain patterns like $& or $1, the issue title and body would contain the matched template placeholder or captured groups instead of the literal job name. While job names rarely contain such patterns, this could cause confusing issue titles when they do.

JPeer264 · 2026-01-08T08:26:35Z

I'm not 100% sure on this. The new ticket is unassigned by default and, I think, it doesn't get an SLA set in Linear.

Manually checking for flakes and opening issues is a bit annoying

To understand this use case - don't we get reported within Sentry if there are flaky tests? We could simply adapt the alert: https://sentry.sentry.io/issues/alerts/rules/details/267675/ (which we can also assign to people)

nicohrubec · 2026-01-08T08:51:56Z

@JPeer264 We only get alerted if there are more than 10 flaky tests detected within an hour, which is a good start but I think not ideal (quite arbitrary since it depends on how much people are pushing to PRs, for instance).

My main goal was to increase visibility of flaky tests, which I think we already get from having them in the issue feed instead of needing to manually check for failing tests and then creating issues so somebody can fix them. I am not exactly sure how/when SLAs are assigned , but we can probably get an SLA by assigning the correct label? Automatically having someone assigned sounds difficult to me though, maybe flaky CI issues can just be another stream that we can look out for during triage.

Happy to explore other options as well (e.g. using the alerts we have), just wanted to put this out to see if we can maybe improve the process around this a bit

github-actions · 2026-02-05T10:50:28Z

Codecov Results 📊

Generated by Codecov Action

cursor · 2026-02-05T10:53:20Z

.github/workflows/build.yml

+              per_page: 100
+            });
+
+            const failedJobs = jobs.filter(job => job.conclusion === 'failure');


Timed-out jobs excluded from flaky test detection

Medium Severity

The filter job.conclusion === 'failure' excludes jobs that timed out, which have conclusion === 'timed_out' in the GitHub API. However, the step condition contains(needs.*.result, 'failure') treats timeouts as failures (since needs.*.result maps timeouts to 'failure'). This mismatch means when a test times out on develop, the step runs but finds no matching jobs, causing no issue to be created for potentially flaky tests that intermittently exceed time limits.

github-actions · 2026-02-26T14:17:35Z

This pull request has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you apply the label PR: no-auto-close I will leave it alone ... forever!

github-actions · 2026-03-06T14:09:55Z

Closing due to inactivity after stale warning. Comment or reopen when ready to continue, and use PR: no-auto-close to opt out of automatic closure.

github-actions · 2026-04-10T11:39:22Z

Semver Impact of This PR

🟢 Patch (bug fixes)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).

New Features ✨

Core

Add enableTruncation option to Google GenAI integration by andreiborza in #20184
Add enableTruncation option to Anthropic AI integration by andreiborza in #20181
Add enableTruncation option to LangGraph integration by andreiborza in #20183
Add enableTruncation option to LangChain integration by andreiborza in #20182
Add enableTruncation option to OpenAI integration by andreiborza in #20167
Export a reusable function to add tracing headers by JPeer264 in #20076

Deps

Bump hono from 4.12.7 to 4.12.12 by dependabot in #20118
Bump defu from 6.1.4 to 6.1.6 by dependabot in #20104

Bug Fixes 🐛

(deno) Avoid inferring invalid span op from Deno tracer by Lms24 in #20128
(e2e) Add op check to waitForTransaction in React Router e2e tests by copilot-swe-agent in #20193

Internal Changes 🔧

(bugbot) Add rules to flag test-flake-provoking patterns by Lms24 in #20192
(deps) Bump axios from 1.13.5 to 1.15.0 in /dev-packages/e2e-tests/test-applications/nestjs-basic by dependabot in #20179
(size-limit) Bump failing size limit scenario by Lms24 in #20186

Add automatic flaky test detector by nicohrubec in #18684

_{🤖 This preview updates automatically when you update the PR.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Issues created for all failed jobs, not just required ones
- I confirmed the script was including failures from non-required jobs and fixed it by restricting issue creation to failed jobs whose names match the required job set.

Or push these changes by commenting:

@cursor push cd11773317

Preview (cd11773317)

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -1192,7 +1192,28 @@
               per_page: 100
             });
 
-            const failedJobs = jobs.filter(job => job.conclusion === 'failure');
+            // `listJobsForWorkflowRun` includes non-required jobs as well, so only keep failures from required jobs.
+            const requiredJobNamePatterns = [
+              /^Build$/,
+              /^Browser Unit Tests$/,
+              /^Bun Unit Tests$/,
+              /^Deno Unit Tests$/,
+              /^Node \(\d+\) Unit Tests$/,
+              /^Node \(\d+\)( \(TS .+\))? Integration Tests$/,
+              /^Cloudflare Integration Tests$/,
+              /^Playwright .+ Tests$/,
+              /^PW .+ Tests$/,
+              /^Remix \(Node \d+\) Tests$/,
+              /^E2E .+ Test$/,
+              /^Upload Artifacts$/,
+              /^Lint$/,
+              /^Check file formatting$/,
+              /^Circular Dependency Check$/,
+              /^Size Check$/,
+            ];
+            const failedJobs = jobs.filter(
+              job => job.conclusion === 'failure' && requiredJobNamePatterns.some(pattern => pattern.test(job.name)),
+            );
 
             if (failedJobs.length === 0) {
               console.log('No failed jobs found');

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit e5db420. Configure here.}

cursor · 2026-04-10T11:45:51Z

.github/workflows/build.yml

+              per_page: 100
+            });
+
+            const failedJobs = jobs.filter(job => job.conclusion === 'failure');


Issues created for all failed jobs, not just required ones

Medium Severity

listJobsForWorkflowRun returns every job in the workflow run, not just the ones in the needs list. After filtering for conclusion === 'failure', the script creates "Flaky CI" issues for any failed job, including non-required ones like job_optional_e2e_tests and job_node_overhead_check — both of which run on the develop branch but are intentionally excluded from the needs list of job_required_jobs_passed. When a required job fails (triggering this step) and a non-required job also happens to fail, spurious flaky-test issues would be opened for jobs that aren't expected to always pass.

^{Reviewed by Cursor Bugbot for commit e5db420. Configure here.}

github-actions · 2026-04-10T11:49:00Z

size-limit report 📦

Path	Size	% Change	Change
@sentry/browser	25.72 kB	-	-
@sentry/browser - with treeshaking flags	24.21 kB	-	-
@sentry/browser (incl. Tracing)	42.73 kB	-	-
@sentry/browser (incl. Tracing, Profiling)	47.35 kB	-	-
@sentry/browser (incl. Tracing, Replay)	81.54 kB	-	-
@sentry/browser (incl. Tracing, Replay) - with treeshaking flags	71.11 kB	-	-
@sentry/browser (incl. Tracing, Replay with Canvas)	86.25 kB	-	-
@sentry/browser (incl. Tracing, Replay, Feedback)	98.45 kB	-	-
@sentry/browser (incl. Feedback)	42.51 kB	-	-
@sentry/browser (incl. sendFeedback)	30.39 kB	-	-
@sentry/browser (incl. FeedbackAsync)	35.38 kB	-	-
@sentry/browser (incl. Metrics)	27.04 kB	-	-
@sentry/browser (incl. Logs)	27.18 kB	-	-
@sentry/browser (incl. Metrics & Logs)	27.86 kB	-	-
@sentry/react	27.48 kB	-	-
@sentry/react (incl. Tracing)	45.05 kB	-	-
@sentry/vue	30.56 kB	-	-
@sentry/vue (incl. Tracing)	44.59 kB	-	-
@sentry/svelte	25.74 kB	-	-
CDN Bundle	28.41 kB	-	-
CDN Bundle (incl. Tracing)	43.75 kB	-	-
CDN Bundle (incl. Logs, Metrics)	29.78 kB	-	-
CDN Bundle (incl. Tracing, Logs, Metrics)	44.83 kB	-	-
CDN Bundle (incl. Replay, Logs, Metrics)	68.59 kB	-	-
CDN Bundle (incl. Tracing, Replay)	80.64 kB	-	-
CDN Bundle (incl. Tracing, Replay, Logs, Metrics)	81.66 kB	-	-
CDN Bundle (incl. Tracing, Replay, Feedback)	86.17 kB	-	-
CDN Bundle (incl. Tracing, Replay, Feedback, Logs, Metrics)	87.2 kB	-	-
CDN Bundle - uncompressed	82.99 kB	-	-
CDN Bundle (incl. Tracing) - uncompressed	129.77 kB	-	-
CDN Bundle (incl. Logs, Metrics) - uncompressed	87.14 kB	-	-
CDN Bundle (incl. Tracing, Logs, Metrics) - uncompressed	133.19 kB	-	-
CDN Bundle (incl. Replay, Logs, Metrics) - uncompressed	210.12 kB	-	-
CDN Bundle (incl. Tracing, Replay) - uncompressed	246.65 kB	-	-
CDN Bundle (incl. Tracing, Replay, Logs, Metrics) - uncompressed	250.05 kB	-	-
CDN Bundle (incl. Tracing, Replay, Feedback) - uncompressed	259.56 kB	-	-
CDN Bundle (incl. Tracing, Replay, Feedback, Logs, Metrics) - uncompressed	262.95 kB	-	-
@sentry/nextjs (client)	47.47 kB	-	-
@sentry/sveltekit (client)	43.2 kB	-	-
@sentry/node-core	57.86 kB	+0.02%	+6 B 🔺
@sentry/node	174.86 kB	+0.01%	+10 B 🔺
@sentry/node - without tracing	97.97 kB	+0.03%	+21 B 🔺
@sentry/aws-serverless	115.22 kB	+0.02%	+19 B 🔺

View base workflow run

JPeer264

Let's try it. If it doesn't work we just revert or adapt 🥇

nicohrubec added 2 commits January 3, 2026 14:22

Add workflow that detects failing tests on develop

bc066bf

Use template

906c299

nicohrubec added 3 commits January 3, 2026 16:39

lint

f3d5ee2

test

29e9b3f

Try to get a nicer name for the issue

0bd3f1a

nicohrubec changed the title ~~Add workflow that detects failing tests on develop~~ ci: Add automatic flaky test detector Jan 5, 2026

revert test failure and only run on develop

ab0f81f

nicohrubec marked this pull request as ready for review January 5, 2026 11:50

nicohrubec requested review from JPeer264, Lms24 and andreiborza January 5, 2026 11:51

sentry bot reviewed Jan 5, 2026

View reviewed changes

.github/workflows/build.yml Show resolved Hide resolved

cursor bot reviewed Jan 5, 2026

View reviewed changes

.github/workflows/build.yml Show resolved Hide resolved

.github/workflows/build.yml Show resolved Hide resolved

.github/workflows/build.yml Show resolved Hide resolved

nicohrubec added 5 commits January 5, 2026 13:00

yarn fix

d56a2d1

Only create issue on failure

6942bd0

fetch all jobs not just 100

0df07fb

fix

1d1449c

test

66ede0f

sentry bot reviewed Jan 5, 2026

View reviewed changes

.github/workflows/build.yml Outdated Show resolved Hide resolved

cursor bot reviewed Jan 5, 2026

View reviewed changes

.github/workflows/build.yml Outdated Show resolved Hide resolved

.github/workflows/build.yml Outdated Show resolved Hide resolved

dev-packages/e2e-tests/test-applications/node-express/tests/errors.test.ts Outdated Show resolved Hide resolved

nicohrubec added 2 commits January 5, 2026 13:33

.

7cb1c57

revert test stuff

e5db420

cursor bot reviewed Jan 5, 2026

View reviewed changes

nicohrubec self-assigned this Jan 16, 2026

nicohrubec closed this Jan 20, 2026

JPeer264 deleted the nh/automatic-flaky-test-detection branch January 21, 2026 16:24

JPeer264 restored the nh/automatic-flaky-test-detection branch February 5, 2026 10:42

JPeer264 reopened this Feb 5, 2026

cursor bot reviewed Feb 5, 2026

View reviewed changes

github-actions bot added the PR: stale label Feb 26, 2026

github-actions bot closed this Mar 6, 2026

nicohrubec reopened this Apr 10, 2026

cursor bot reviewed Apr 10, 2026

View reviewed changes

JPeer264 approved these changes Apr 10, 2026

View reviewed changes

github-actions bot removed the PR: stale label Apr 10, 2026

Lms24 approved these changes Apr 10, 2026

View reviewed changes

nicohrubec merged commit f1932c9 into develop Apr 10, 2026
194 of 197 checks passed

nicohrubec deleted the nh/automatic-flaky-test-detection branch April 10, 2026 15:40

Uh oh!

Conversation

nicohrubec commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

node-overhead report 🧳

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Jan 5, 2026

Choose a reason for hiding this comment

String replace special patterns corrupt job name substitution

Uh oh!

JPeer264 commented Jan 8, 2026

Uh oh!

nicohrubec commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Results 📊

Uh oh!

cursor bot Feb 5, 2026

Choose a reason for hiding this comment

Timed-out jobs excluded from flaky test detection

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Semver Impact of This PR

New Features ✨

Core

Deps

Bug Fixes 🐛

Internal Changes 🔧

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 10, 2026

Choose a reason for hiding this comment

Issues created for all failed jobs, not just required ones

Uh oh!

github-actions bot commented Apr 10, 2026

size-limit report 📦

Uh oh!

JPeer264 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nicohrubec commented Jan 3, 2026 •

edited

Loading

github-actions bot commented Jan 3, 2026 •

edited

Loading

nicohrubec commented Jan 8, 2026 •

edited

Loading

github-actions bot commented Feb 5, 2026 •

edited

Loading

cursor bot left a comment •

edited

Loading